Unsupervised Audio Scene Analysis

نویسنده

  • Chris Stauffer
چکیده

Motivation: While many company’s are expending great effort in the field of automatic speech recognition(ASR), little attention is being paid to general audio and long-term modeling of audio in general. Even an ASR system which could give a complete transcription of the words heard in an environment would lack vital information. e.g., who was talking, when they were talking, what was the tone of the conversation, did someone slam the door, did someone use a harsh expletive, when do these conversations tend to happen, etc. In fact, this information is potentially of great value even without the transcription of what was said.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Traffic Scene Analysis using Hierarchical Sparse Topical Coding

Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...

متن کامل

Singing Voice Separation Using Spectro-Temporal Modulation Features

An auditory-perception inspired singing voice separation algorithm for monaural music recordings is proposed in this paper. Under the framework of computational auditory scene analysis (CASA), the music recordings are first transformed into auditory spectrograms. After extracting the spectral-temporal modulation contents of the timefrequency (T-F) units through a two-stage auditory model, we de...

متن کامل

Anti-social Behavior Detection in Audio-Visual Surveillance Systems

In this paper we propose a general purpose framework for detection of unusual events. The proposed system is based on the unsupervised method for unusual scene detection in web–cam images that was introduced in [1]. We extend their algorithm to accommodate data from different modalities and introduce the concept of time-space blocks. In addition, we evaluate early and late fusion techniques for...

متن کامل

Nonnegative Tensor Factorization with Frequency Modulation Cues for Blind Audio Source Separation

We present Vibrato Nonnegative Tensor Factorization, an algorithm for single-channel unsupervised audio source separation with an application to separating instrumental or vocal sources with nonstationary pitch from music recordings. Our approach extends Nonnegative Matrix Factorization for audio modeling by including local estimates of frequency modulation as cues in the separation. This permi...

متن کامل

A scheme for racquet sports video analysis with the combination of audio-visual information

As a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001